Elasticsearch Query DSL Syntax Notes

I previously mentioned that I would be grinding away at Elasticsearch until the end of this year, but I've changed my plans. After this post, I will likely only add one more article on its application in .NET before wrapping things up. I originally expected to finish by the end of October, but it dragged into November. At this rate, the next post will probably be delayed until the end of the year as well.

I had intended to cover Geo Queries and Aggregations as well, but I decided it would be better to split them up. Aggregations lean more towards statistical analysis and aren't strictly related to query syntax itself; as for Geo Queries, I'll put them on hold. I've been working on this post for too long and it's becoming a bit tedious, so I'll write a separate one when I have the time and the mood.

My weight loss progress stalled from September 16th to October 9th, and in October, I inexplicably fell into a state of world-weariness, wanting only to stay home, read novels, and scroll through short videos, with no desire to go out or use my brain. I don't know if I'll fall back into that state, and I have no idea when the next post will be finished.

I previously wrote a note on Elasticsearch QueryString syntax, which mainly introduced how to use simple query strings in the query_string field. That syntax is concise and intuitive, making it very suitable for quick tests or simple search requirements. However, in actual production environments, Query DSL (Domain Specific Language) is used more frequently. Query DSL is a JSON-structured query language provided by Elasticsearch, and its functionality is far more powerful and flexible than Query String. This article focuses on test results, supplemented by cross-verification with official documentation, to organize Query DSL syntax.

Test version: Elasticsearch 9.1.5

Query DSL vs Query String

Before we begin, let's briefly explain the advantages of Query DSL over Query String:

1. More Complete Functionality

Certain query features can only be implemented using Query DSL and are not supported by Query String:

Nested Queries: When you need to preserve the relationships between fields within nested objects, you must use the nested query in Query DSL.
Geospatial Queries: Such as geo_distance and other geographic query functions.
Custom Scoring: Use function_score to customize the relevance scoring of documents.
Complex Boolean Logic Combinations: Flexibly combine must, should, must_not, filter, and other conditions through bool queries.

2. Clearer Structure

Query String:

json

{
  "query": {
    "query_string": {
      "query": "title:Elasticsearch AND status:published AND created_date:[2024-01-01 TO 2024-12-31]"
    }
  }
}

Query DSL:

json

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Elasticsearch" }},
        { "term": { "status": "published" }},
        { "range": {
            "created_date": {
              "gte": "2024-01-01",
              "lte": "2024-12-31"
            }
          }
        }
      ]
    }
  }
}

Although Query DSL looks more verbose, the structure is clearer. Each query condition has a specific type and parameters, making it easier to maintain and debug. Furthermore, Query DSL provides clearer error messages, explicitly pointing out which field or parameter is problematic.

Common Query DSL Syntax

1. Match Query - Full-Text Search

Used for full-text search; it performs tokenization and relevance scoring.

Applicable Types:

Text fields: Tokenized, supports all advanced parameters.
Keyword fields: Not tokenized, exact match.
Numeric/Date/Boolean fields: Exact match, does not support parameters like fuzziness or analyzer.

Basic Query

json

{
  "query": {
    "match": {
      "title": "Elasticsearch Tutorial"
    }
  }
}

operator Parameter

Controls the logical relationship between multiple tokens.

OR (Default)

Returns results if any of the terms match:

json

{
  "query": {
    "match": {
      "title": {
        "query": "quick brown fox",
        "operator": "OR"
      }
    }
  }
}

Effect: Documents containing any of the terms quick, brown, or fox will be returned.

AND

Must match all terms:

json

{
  "query": {
    "match": {
      "title": {
        "query": "quick brown fox",
        "operator": "AND"
      }
    }
  }
}

Effect: Documents must contain all three terms: quick, brown, and fox.

minimum_should_match Parameter

Important: This parameter is only effective when operator = "OR".

Controls the minimum number of conditions that must be met.

Positive Integer (Absolute Quantity)

json

{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": 3
      }
    }
  }
}

Effect: At least 3 out of 4 terms must match.

Examples:

quick brown fox jumps ✓ (All 4 match).
quick brown fox dog ✓ (3 match: quick brown fox).
quick brown lazy dog ✗ (Only 2 match: quick brown).
the fox jumps high ✗ (Only 2 match: fox jumps).

Negative Integer (Allowed Missing Quantity)

json

{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": -1
      }
    }
  }
}

Effect: At most 1 term can be missing, equivalent to requiring at least 3.

Examples:

quick brown fox jumps ✓ (0 missing).
quick brown fox dog ✓ (1 missing: jumps).
quick brown lazy dog ✗ (2 missing: fox and jumps).

⚠️ Special Case: The minimum match count is guaranteed to be 1.

When setting -4 (missing count = total tokens) or -100% (100% missing), it will not return all data; at least 1 term must match to return results.

Examples (-4 or -100%):

quick dog ✓ (1 match: quick).
brown cat ✓ (1 match: brown).
lazy slow ✗ (0 matches).

Percentage (Floor Rule)

json

{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": "75%"
      }
    }
  }
}

Effect: At least 75% must match, which is at least 3 out of 4 terms (4 × 0.75 = 3).

⚠️ Calculation Rule (Floor Rule):

75%: 4 × 0.75 = 3.0 → 3 terms.
74%: 4 × 0.74 = 2.96 → Floored to 2 terms.
50%: 4 × 0.50 = 2.0 → 2 terms.
26%: 4 × 0.26 = 1.04 → Floored to 1 term.
25%: 4 × 0.25 = 1.0 → 1 term.

Examples (75%):

quick brown fox jumps ✓ (100% match).
quick brown fox dog ✓ (3 match, 75% met).
quick brown dog cat ✗ (Only 2 match, less than 75%).

Examples (74%):

quick brown dog cat ✓ (2 match, 2.96 floored to 2).
quick dog cat rat ✗ (Only 1 match).

Negative Percentage (Floor Rule)

json

{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": "-25%"
      }
    }
  }
}

Effect: At most 25% missing, which is at most 1 term missing (4 × 0.25 = 1), equivalent to requiring at least 3.

⚠️ Calculation Rule (Floor Rule):

-25%: 4 × 0.25 = 1 → At most 1 missing, requires 3.
-26%: 4 × 0.26 = 1.04 → Floored to 1, at most 1 missing, requires 3.
-74%: 4 × 0.74 = 2.96 → Floored to 2, at most 2 missing, requires 2.
-75%: 4 × 0.75 = 3 → At most 3 missing, requires 1.

Examples (-25%):

quick brown fox jumps ✓ (0 missing).
quick brown fox dog ✓ (1 missing, meets at most 25% missing).
quick brown dog cat ✗ (2 missing, exceeds limit).

Examples (-74%):

quick brown dog cat ✓ (2 match, at most 2 missing).
quick dog cat rat ✗ (Only 1 match, 3 missing).

Examples (-75%):

quick dog cat rat ✓ (1 match, at most 3 missing).
lazy slow fast dog ✗ (0 matches).

Single Condition Combination (Advanced)

⚠️ Important: How to interpret single conditions.

Format: N<VALUE or N>VALUE.

N<VALUE: When token count ≤ N, use default rule (100%); when > N, apply VALUE rule.
N>VALUE: When token count > N, use default rule (100%); when ≤ N, apply VALUE rule.

Example 1: 3<90%

json

{
  "query": {
    "match": {
      "content": {
        "query": "some long search query with many terms",
        "minimum_should_match": "3<90%"
      }
    }
  }
}

Interpretation:

When query is ≤ 3 tokens: Requires 100% match (default).
When query is > 3 tokens: Requires 90% match.

Example (Assuming query "one two three four five", 5 terms):

one two three four five ✓ (100% match, 5/5).
one two three four dog ✓ (80% match, but only 90% needed because 5 > 3).
one two three dog cat ✗ (Only 60% match, 3/5).

Example 2: 3<-1

json

{
  "query": {
    "match": {
      "content": {
        "query": "alpha beta gamma delta",
        "minimum_should_match": "3<-1"
      }
    }
  }
}

Interpretation:

When query is ≤ 3 tokens: Requires 100% match.
When query is > 3 tokens: At most 1 missing.

Example (4 terms):

alpha beta gamma delta ✓ (0 missing).
alpha beta gamma dog ✓ (1 missing: delta).
alpha beta dog cat ✗ (2 missing: gamma and delta).

Multiple Condition Combination (Advanced)

⚠️ Important: Multiple conditions are interpreted differently from single conditions.

Format: N1<VALUE1 N2<VALUE2 ....

Multiple conditions are interpreted as "ranges" rather than "less than":

Before the first condition: Use default rule (100%).
Between N1 and N2: Apply VALUE1.
After N2: Apply VALUE2.

Example: 2<-25% 9<-3

json

{
  "query": {
    "match": {
      "content": {
        "query": "very long search query with lots of terms",
        "minimum_should_match": "2<-25% 9<-3"
      }
    }
  }
}

⚠️ Correct Interpretation (Range approach):

≤ 2 tokens: 100% match (default).
3-9 tokens: At most 25% missing (applies first condition -25%).
> 9 tokens: At most 3 missing (applies second condition -3).

❌ Incorrect Interpretation (Understanding via single condition logic):

~~≤ 2: apply -25%~~ (Incorrect!)
~~> 9: apply -3~~ (Incorrect!)

Example (Assuming query of 10 terms):

Matches 10 terms ✓ (0 missing).
Matches 7 terms ✓ (3 missing, meets > 9 rule).
Matches 6 terms ✗ (4 missing, exceeds limit).

Example (Assuming query of 5 terms):

Matches 5 terms ✓ (0% missing).
Matches 4 terms ✓ (1 missing, 5 × 25% = 1.25 → floored to 1, meets at most 1 missing).
Matches 3 terms ✗ (2 missing, exceeds limit).

fuzziness Parameter

Fuzzy matching, allows for spelling errors. Only applicable to text fields.

AUTO (Recommended)

json

{
  "query": {
    "match": {
      "title": {
        "query": "Elasticsearc",
        "fuzziness": "AUTO"
      }
    }
  }
}

Effect: Automatically determines allowed edit distance based on term length.

Examples:

Elasticsearch ✓ (1 char difference: h).
Elasticsearc ✓ (Exact match).
Elasticserch ✓ (1 char difference).
Elastix ✗ (Too much difference).

Fixed Edit Distance

json

{
  "query": {
    "match": {
      "title": {
        "query": "quikc brown",
        "fuzziness": 1
      }
    }
  }
}

Effect: Allows at most 1 character difference (insertion, deletion, substitution).

Examples:

quick brown ✓ (quikc → quick, 1 char difference).
quikc brown ✓ (Exact match).
qukc brown ✗ (2 char difference).
qick brown ✓ (1 char difference).

Related Parameters

json

{
  "query": {
    "match": {
      "title": {
        "query": "quikc brown fox",
        "fuzziness": "AUTO",
        "prefix_length": 2,
        "max_expansions": 10,
        "fuzzy_transpositions": true
      }
    }
  }
}

Parameter Explanation:

prefix_length: The first N characters must match exactly, default is 0.
max_expansions: Maximum number of candidate terms to expand during fuzzy matching, default is 50.
fuzzy_transpositions: Whether to allow adjacent character swaps (ab → ba), default is true.

Example (prefix_length = 2):

quick brown fox ✓ (Starts with qu, matches prefix).
quikc brown fox ✓ (Starts with qu, matches prefix).
xuick brown fox ✗ (First 2 characters xu do not match qu).

Example (max_expansions = 10):

Suppose the index contains these terms: quick, quit, quiz, quiet, quiche, quill, quirk, quack, queue, quartz, qualify, quarrel... (20+ similar terms).

When querying qui:

json

{
  "query": {
    "match": {
      "title": {
        "query": "qui",
        "fuzziness": 1,
        "max_expansions": 10
      }
    }
  }
}

Effect:

Elasticsearch finds all similar terms with edit distance ≤ 1 (possibly 20+).
Only the first 10 candidate terms are taken for searching (e.g., qui, quit, quiz, quiet, quick, quiche, quill, quirk, quack, queue).
Other candidates (like quartz, qualify, quarrel...) are ignored.

Why limit this?

Performance considerations: Expanding into dozens of candidates consumes significant computing resources, slowing down the query.
Result quality: Too many candidates may include irrelevant results.

Example (fuzzy_transpositions = true):

qiuck ✓ (ui ↔ iu, swapped).
qukic ✓ (ki ↔ ik, swapped).

Example (fuzzy_transpositions = false):

json

{
  "query": {
    "match": {
      "title": {
        "query": "qiuck",
        "fuzziness": 1,
        "fuzzy_transpositions": false
      }
    }
  }
}

qiuck ✗ (ui ↔ iu swap not allowed, requires 2 edits: delete i, insert u).
quick ✓ (Requires only 1 edit: replace i → u).

Other Parameters

analyzer

Specifies the analyzer (defaults to the analyzer configured for the field):

json

{
  "query": {
    "match": {
      "content": {
        "query": "Quick Brown",
        "analyzer": "standard"
      }
    }
  }
}

lenient

Controls how to handle cases where the query value does not match the field type, default is false.

Parameter Explanation:

false (default): Throws an error and the query fails if the type does not match.
true: Ignores the query for that field if the type does not match; no error is thrown, but that field will have no matches.

Example 1: lenient = false (default)

json

{
  "query": {
    "match": {
      "age": {
        "query": "not a number"
      }
    }
  }
}

Effect:

Because the age field is numeric and the query value "not a number" is text.
The query throws an error.

Example 2: lenient = true

json

{
  "query": {
    "match": {
      "age": {
        "query": "not a number",
        "lenient": true
      }
    }
  }
}

Effect:

The query does not throw an error.
But because the type does not match, the field will not match any documents (equivalent to the condition being ignored).
The query executes normally, just with no results.

boost

Adjusts the relevance score weight, default is 1.0:

json

{
  "query": {
    "match": {
      "title": {
        "query": "Elasticsearch",
        "boost": 2.0
      }
    }
  }
}

zero_terms_query

How to handle cases where the query results in no tokens after analysis (becomes an empty query), default is none.

Parameter Explanation:

none (default): Returns no documents.
all: Returns all documents (equivalent to match_all).

Example 1: Empty string query

json

{
  "query": {
    "match": {
      "message": {
        "query": "",
        "zero_terms_query": "none"  // or "all"
      }
    }
  }
}

Effect:

zero_terms_query: "none": Returns no documents.
zero_terms_query: "all": Returns all documents.

Example 2: Stop filter removes all terms

Suppose the message field uses a stop filter containing to, be, or, not (requires extra configuration), when querying "to be or not to be":

json

{
  "query": {
    "match": {
      "message": {
        "query": "to be or not to be",
        "zero_terms_query": "none"  // or "all"
      }
    }
  }
}

Process:

Original query: "to be or not to be".
Stop filter removes all stop words, leaving 0 tokens (becomes an empty query).
zero_terms_query: "none": Returns no documents; zero_terms_query: "all": Returns all documents.

Use Cases:

zero_terms_query: "all": Search boxes that allow empty queries, or where users might only input stop words but still expect feedback.
zero_terms_query: "none": Disallows empty queries (most default behaviors).

WARNING

zero_terms_query is only triggered when the query truly becomes empty.

If the query terms are not removed but simply cannot be found in the index, it will return 0 results normally rather than triggering zero_terms_query. For example, if the field does not have a stop filter configured, querying "to be or not to be" will not trigger zero_terms_query, but will search for those terms normally.

2. Multi Match Query - Multi-field Search

Searches for the same keyword across multiple fields.

json

{
  "query": {
    "multi_match": {
      "query": "Elasticsearch",
      "fields": ["title^3", "content", "tags"],
      "type": "best_fields"
    }
  }
}

Parameter Explanation:

fields: List of fields, the number after ^ represents the weight. Fields can use wildcards, e.g., "title" and "*_name" will search title, first_name, last_name, etc.
type: Query type.

Parameter Support by Type

Parameter	Description	best_fields	most_fields	cross_fields	phrase	phrase_prefix	bool_prefix
`fuzziness`	Fuzzy match, allows spelling errors (supports `AUTO`, `0`, `1`, `2`)	✅	✅	❌	❌	❌	✅
`prefix_length`	First N characters must match exactly (default `0`)	✅	✅	❌	❌	❌	✅
`max_expansions`	Max candidate terms to expand during fuzzy match (default `50`)	✅	✅	❌	❌	✅	✅
`fuzzy_transpositions`	Whether to allow adjacent character swaps (default `true`)	✅	✅	❌	❌	❌	✅
`fuzzy_rewrite`	Rewrite method for fuzzy queries	✅	✅	❌	❌	❌	✅
`slop`	Allowed term spacing for phrase queries	❌	❌	❌	✅	✅	❌

lenient Parameter

The lenient parameter is particularly useful in multi-field queries because different fields may have different data types.

Suppose the index has the following fields:

title (text)
price (integer)

json

{
  "query": {
    "multi_match": {
      "query": "not a number",
      "fields": ["title", "price"],
      "lenient": false
    }
  }
}

Effect (lenient = false, default):

title field is text, can handle "not a number" normally.
price field is integer, cannot handle "not a number".
The query throws an error, and the entire query fails.

json

{
  "query": {
    "multi_match": {
      "query": "not a number",
      "fields": ["title", "price"],
      "lenient": true
    }
  }
}

Effect (lenient = true):

title field searches "not a number" normally.
price field is ignored due to type mismatch; no error is thrown.
The query executes normally, searching only in the title field.

Query Type Explanation

To better illustrate the differences between various query types, we use the following test data:

Test Data:

json

// Document 1
{
  "title": "brown fox jumps",
  "subject": "quick animal",
  "message": "The quick brown fox"
}

// Document 2
{
  "title": "quick brown",
  "subject": "fox hunting",
  "message": "Guide to fox hunting"
}

// Document 3
{
  "title": "fast animal",
  "subject": "brown bear",
  "message": "The brown bear is slow"
}

best_fields (default)

Takes the score of the highest-scoring field, suitable for finding "best match in a single field".

json

{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "best_fields",
      "fields": ["title", "subject", "message"],
      "tie_breaker": 0.3
    }
  }
}

Internal Execution Logic (equivalent to):

json

{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "title": "quick brown fox" }},
        { "match": { "subject": "quick brown fox" }},
        { "match": { "message": "quick brown fox" }}
      ],
      "tie_breaker": 0.3
    }
  }
}

Scoring Method:

Takes the score of the highest-scoring field.
If tie_breaker is set, it becomes: Highest Score + (Other field scores × tie_breaker).

Query Result Analysis:

Assuming "quick brown fox" is queried, the base score for each field is as follows (actual scores are affected by BM25 algorithm, term frequency, document length, etc.):

Document	title Score	subject Score	message Score	Final Score Calculation (tie_breaker=0.3)
Doc 1	1.5 (brown, fox)	1.0 (quick)	5.0 (quick, brown, fox)	5.0 + (1.5 + 1.0) × 0.3 = 5.75
Doc 2	3.0 (quick, brown)	1.0 (fox)	1.0 (fox)	3.0 + (1.0 + 1.0) × 0.3 = 3.6
Doc 3	0	1.0 (brown)	1.0 (brown)	1.0 + 1.0 × 0.3 = 1.3

Calculation Logic:

Select the highest-scoring field as the base score.
Multiply the scores of all other matching fields by the tie_breaker and sum them up.
Formula: Highest Score + (Sum of other field scores × tie_breaker).

Conclusion: Document 1 has the highest score because the message field contains all three terms and is the highest-scoring, while the other two fields also contribute.

most_fields

Combines the scores of all fields, suitable for "multiple similar fields" (e.g., different tokenization methods for the same content).

json

{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "most_fields",
      "fields": ["title", "subject", "message"]
    }
  }
}

Internal Execution Logic (equivalent to):

json

{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "quick brown fox" }},
        { "match": { "subject": "quick brown fox" }},
        { "match": { "message": "quick brown fox" }}
      ]
    }
  }
}

Scoring Method:

Sums the scores of all fields.

Query Result Analysis:

Document	title Score	subject Score	message Score	Final Score (Sum)
Doc 1	1.5 (brown, fox)	1.0 (quick)	5.0 (quick, brown, fox)	1.5 + 1.0 + 5.0 = 7.5
Doc 2	3.0 (quick, brown)	1.0 (fox)	1.0 (fox)	3.0 + 1.0 + 1.0 = 5.0
Doc 3	0	1.0 (brown)	1.0 (brown)	0 + 1.0 + 1.0 = 2.0

Conclusion: Document 1 has the highest score because it matches in multiple fields.

Difference from best_fields:

The main difference between best_fields and most_fields lies in the default value of tie_breaker:

best_fields: Default tie_breaker = 0.0 (takes only the highest score).
most_fields: Default tie_breaker = 1.0 (sums all scores).

When both are set to the same tie_breaker value, the calculated scores will be the same.

cross_fields

Cross-field search, treats multiple fields as one large field, suitable for cases like names, addresses, etc., where matches need to span fields.

Test Data (Name Example):

json

// Document 1
{ "first_name": "Wing", "last_name": "Chou" }

// Document 2
{ "first_name": "Chou", "last_name": "Chen" }

// Document 3
{ "first_name": "John", "last_name": "Wing" }

json

{
  "query": {
    "multi_match": {
      "query": "Wing Chou",
      "type": "cross_fields",
      "fields": ["first_name", "last_name"],
      "operator": "and"
    }
  }
}

Execution Logic:

According to official documentation, cross_fields analyzes the query string into individual terms and then searches for each term in any of the fields, as if they were one large field.

text

+blended(terms:[first_name:wing, last_name:wing])
+blended(terms:[first_name:chou, last_name:chou])

This means each term can be scattered across different fields, as long as each term appears in at least one field.

Query Result Analysis:

Document	Matches?	Explanation
Doc 1	✅	`Wing` in first_name, `Chou` in last_name (scattered across different fields)
Doc 2	❌	Only `Chou` matches, missing `Wing`
Doc 3	❌	Only `Wing` matches, missing `Chou`

WARNING

When the search_analyzer settings for fields are inconsistent (e.g., one field has an analyzer configured and another does not), the behavior of cross_fields changes. For example, the execution logic becomes:

text

((+first_name:wing +first_name:chou) | (+last_name:wing +last_name:chou))

In this case, all terms must appear in the same field, rather than being scattered across different fields, behaving similarly to best_fields (but with different field ordering).

Additionally, combined_fields queries will fail if fields use different search_analyzers, so if you have custom analyzer requirements, you need to be particularly aware of this limitation.

Scoring Method:

Blends term frequency statistics across all fields to avoid results being skewed by high term frequency in a single field.
tie_breaker can be used to adjust scoring behavior (default is 0.0).

phrase

Phrase query, terms must appear in order.

Test Data:

json

// Document 1
{ "title": "quick brown fox", "message": "The fox is quick" }

// Document 2
{ "title": "brown quick fox", "message": "quick brown fox jumps" }

// Document 3
{ "title": "fast brown fox", "message": "A brown and quick animal" }

json

{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "phrase",
      "fields": ["title", "message"]
    }
  }
}

Internal Execution Logic (equivalent to):

json

{
  "query": {
    "dis_max": {
      "queries": [
        { "match_phrase": { "title": "quick brown fox" }},
        { "match_phrase": { "message": "quick brown fox" }}
      ]
    }
  }
}

Query Result Analysis:

Document	title matches	message matches	Returns?
Doc 1	✅ (order correct)	❌ (order wrong: "fox is quick")	✅
Doc 2	❌ (order wrong: "brown quick fox")	✅ (order correct)	✅
Doc 3	❌ (extra "fast" in middle)	❌ (terms scattered: "brown and quick")	❌

Conclusion: Phrase queries require terms to appear adjacent and in order.

Using with slop parameter:

json

{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "phrase",
      "fields": ["title", "message"],
      "slop": 1
    }
  }
}

Query Result Changes:

Document	title matches	message matches	Returns?
Doc 1	✅	❌ (requires slop = 2)	✅
Doc 2	❌ (requires slop = 2)	✅	✅
Doc 3	✅ ("fast" counts as 1 interval)	❌ (requires larger slop)	✅

phrase_prefix

Phrase prefix query, the last term can be a prefix match.

Test Data:

json

// Document 1
{ "title": "quick brown fox", "message": "quick brown forest" }

// Document 2
{ "title": "quick brown food", "message": "quick brown" }

// Document 3
{ "title": "fast brown fox", "message": "quick blue forest" }

json

{
  "query": {
    "multi_match": {
      "query": "quick brown f",
      "type": "phrase_prefix",
      "fields": ["title", "message"]
    }
  }
}

Internal Execution Logic (equivalent to):

json

{
  "query": {
    "dis_max": {
      "queries": [
        { "match_phrase_prefix": { "title": "quick brown f" }},
        { "match_phrase_prefix": { "message": "quick brown f" }}
      ]
    }
  }
}

Query Result Analysis:

Document	title matches	message matches	Returns?
Doc 1	✅ (f prefix matches fox)	✅ (f prefix matches forest)	✅
Doc 2	✅ (f prefix matches food)	❌ (no term starting with f)	✅
Doc 3	❌ (missing "quick")	❌ (missing "brown")	❌

Conclusion: The first N-1 terms must match exactly and in order, the last term can be a prefix match.

bool_prefix

Boolean prefix query, the last term uses prefix matching, other terms use exact matching.

Test Data:

json

// Document 1
{ "title": "quick brown fox", "message": "forest animals" }

// Document 2
{ "title": "brown food quick", "message": "quick forest" }

// Document 3
{ "title": "fast fox", "message": "brown quick forest" }

json

{
  "query": {
    "multi_match": {
      "query": "quick brown f",
      "type": "bool_prefix",
      "fields": ["title", "message"]
    }
  }
}

Scoring Method:

Similar to most_fields, but uses match_bool_prefix query.
Supports fuzzy query parameters, but only effective for non-prefix terms.

Query Result Analysis:

Document	title matches	message matches	Returns?	Explanation
Doc 1	✅ (quick, brown, f prefix)	✅ (f prefix matches forest)	✅	All terms match
Doc 2	✅ (quick, brown, f prefix matches food)	✅ (quick, f prefix matches forest)	✅	Term order doesn't matter
Doc 3	✅ (f prefix matches fox)	✅ (brown, quick, f prefix matches forest)	✅	Terms can be scattered across fields

Difference from phrase_prefix:

Feature	phrase_prefix	bool_prefix
Term order	Must be in order	Order not required
Term position	Must be adjacent	Can be scattered
Use case	Exact phrase search	Flexible autocomplete

Example:

Querying "quick brown f":

phrase_prefix: Must be in the order "quick brown f...".
bool_prefix: Can be any order like "brown quick f..." or "f... brown quick".

3. Combined Fields Query - Cross-Field Term Search

The combined_fields query adopts a term-centric approach, treating multiple text fields as a single combined field for searching. It is particularly suitable for cases where query terms might be scattered across multiple fields, such as an article's title, abstract, and body.

Basic Query:

json

{
  "query": {
    "combined_fields": {
      "query": "database systems",
      "fields": ["title", "abstract", "body"],
      "operator": "and"
    }
  }
}

Test Data:

json

// Document 1
{
  "title": "Database Management",
  "abstract": "Modern systems overview",
  "body": "Relational database concepts"
}

// Document 2
{
  "title": "Information Systems",
  "abstract": "Database architecture",
  "body": "Design patterns"
}

// Document 3
{
  "title": "NoSQL Solutions",
  "abstract": "Alternative approaches",
  "body": "Non-relational systems"
}

Query Result Analysis:

When querying "database systems":

Document	Matches?	Returns?	Explanation
Doc 1	✅	✅	"database" in title and body, "systems" in abstract
Doc 2	✅	✅	"database" in abstract, "systems" in title
Doc 3	✅	✅	"systems" in body (if operator is "or")

Main Parameters

fields (Required)

List of fields, supports wildcards. All fields must be of type text and use the same search analyzer.

json

{
  "query": {
    "combined_fields": {
      "query": "quick search",
      "fields": ["title^2", "content", "*_text"]
    }
  }
}

boost

You can use the ^ symbol to set field weights (must be ≥ 1.0, can be a decimal), or use the boost parameter to adjust the weight of the entire query:

json

{
  "query": {
    "combined_fields": {
      "query": "distributed consensus",
      "fields": ["title^2", "body"],
      "boost": 1.5
    }
  }
}

Test Data:

json

// Document 1
{ "title": "Consensus Algorithms", "body": "Distributed systems basics" }

// Document 2
{ "title": "Network Protocols", "body": "Distributed consensus mechanisms" }

Scoring Method:

Document 1: title contains "consensus" (weight × 2), body contains "distributed", overall score is higher.
Document 2: Both terms are in body (no weight bonus), score is lower.

operator

Sets the logical relationship between terms, default is or.

or (default): Matches if any term matches.
and: All terms must match.

json

{
  "query": {
    "combined_fields": {
      "query": "database systems",
      "fields": ["title", "abstract", "body"],
      "operator": "and"
    }
  }
}

minimum_should_match

Minimum number of matches, usage is the same as match query. Supports:

Positive integer: Absolute quantity (e.g., 3).
Negative integer: Allowed missing quantity (e.g., -1).
Percentage: "75%" or "-25%".
Condition combination: "3<90%" or "2<-25% 9<-3".

For detailed explanation, please refer to the minimum_should_match parameter in the "Match Query" section.

json

{
  "query": {
    "combined_fields": {
      "query": "quick brown fox jumps",
      "fields": ["title", "content"],
      "minimum_should_match": "75%"
    }
  }
}

zero_terms_query

How to handle cases where there are no tokens after analysis, default is none.

none (default): Returns no documents.
all: Returns all documents.

For detailed explanation, please refer to the zero_terms_query parameter in the "Match Query" section.

auto_generate_synonyms_phrase_query

Whether to automatically create phrase queries for multi-term synonyms, default is true.

json

{
  "query": {
    "combined_fields": {
      "query": "quick",
      "fields": ["title", "body"],
      "auto_generate_synonyms_phrase_query": true
    }
  }
}

Effect: If "quick" has a synonym "fast running", a phrase query "fast running" will be automatically created.

WARNING

Using the synonym feature requires configuring a synonym filter in the field's search_analyzer. However, combined_fields requires all fields to use the same search_analyzer. If the analyzer settings for the fields are inconsistent, the query will fail. Therefore, when using this parameter, ensure all queried fields use the same synonym configuration.

Execution Logic

json

{
  "query": {
    "combined_fields": {
      "query": "database systems",
      "fields": ["title", "abstract"],
      "operator": "and"
    }
  }
}

Actual Execution Logic:

text

+(combined("database", fields:["title", "abstract"]))
+(combined("systems", fields:["title", "abstract"]))

Meaning: Each term must appear in at least one field (can be scattered across different fields).

Usage Limitations

Field Type Limitation: Only supports text fields, does not support keyword, numeric, date, etc.
Analyzer Limitation: All fields must use the same search analyzer.
Similarity Limitation: Only supports BM25 similarity (Elasticsearch's default), does not support custom similarity or per-field similarity settings.
Clause Count Limitation: The number of query clauses is limited by indices.query.bool.max_clause_count (default 4096), calculated as "number of fields × number of terms".

Example:

json

{
  "query": {
    "combined_fields": {
      "query": "quick brown fox jumps",
      "fields": ["title", "abstract", "body"]
    }
  }
}

Number of terms: 4 (quick, brown, fox, jumps).
Number of fields: 3 (title, abstract, body).
Clause count: 4 × 3 = 12 (far below the 4096 limit).

4. Match Phrase Query - Phrase Search

Must match the phrase order completely, suitable for searching fixed phrases.

json

{
  "query": {
    "match_phrase": {
      "content": {
        "query": "quick brown fox",
        "slop": 1
      }
    }
  }
}

Parameter Explanation:

query: The phrase to search for.
analyzer: Specifies the analyzer (defaults to the analyzer configured for the field).
boost: Adjusts the relevance score weight, default is 1.0.
slop: Maximum number of intervals allowed between terms, default is 0 (must be completely adjacent).
zero_terms_query: How to handle cases where there are no tokens after analysis (none or all).

Test Data:

json

// Document 1
{ "content": "The quick brown fox jumps over the lazy dog" }

// Document 2
{ "content": "A quick and brown fox in the forest" }

// Document 3
{ "content": "The brown quick fox runs fast" }

Query Result (slop = 0):

json

{
  "query": {
    "match_phrase": {
      "content": "quick brown fox"
    }
  }
}

Document	Matches?	Explanation
Doc 1	✅	Term order correct and adjacent
Doc 2	❌	"and" in middle, not adjacent
Doc 3	❌	Order wrong (brown quick)

Query Result (slop = 1):

json

{
  "query": {
    "match_phrase": {
      "content": {
        "query": "quick brown fox",
        "slop": 1
      }
    }
  }
}

Document	Matches?	Explanation
Doc 1	✅	Term order correct and adjacent
Doc 2	✅	1 term in middle ("and"), meets slop = 1
Doc 3	❌	Order wrong, requires 2 moves to match

5. Term Query - Exact Match

Used for exact value queries, does not perform tokenization, matches terms directly in the index.

json

{
  "query": {
    "term": {
      "status": {
        "value": "published"
      }
    }
  }
}

Parameter Explanation:

value: The exact value to query.
boost: Adjusts the relevance score weight, default is 1.0.
case_insensitive: Whether to ignore case, default is false (supported since Elasticsearch 7.10+).

Applicable Types:

Keyword fields: Matches original value exactly.
Text fields: Matches tokenized terms, not the original text.
Numeric, Date, Boolean: Exact value match.

Use Cases:

Exact match for keyword fields (status, tags, IDs, etc.).
Exact query for numeric, date, boolean values.
Specific term query for text fields (requires understanding tokenization results).

Test Data:

json

// Document 1
{ "status": "published", "title": "Elasticsearch Guide" }

// Document 2
{ "status": "draft", "title": "Quick Tutorial" }

Query Example (Keyword field):

json

{
  "query": {
    "term": {
      "status": "published"
    }
  }
}

Document	Matches?	Explanation
Doc 1	✅	status matches "published" exactly
Doc 2	❌	status is "draft"

Query Example (Text field):

Assuming title is a text field using the standard analyzer:

json

{
  "query": {
    "term": {
      "title": "elasticsearch"
    }
  }
}

Document	Matches?	Explanation
Doc 1	✅	"Elasticsearch Guide" tokenized contains "elasticsearch"
Doc 2	❌	Tokenized does not contain "elasticsearch"

WARNING

When using term query on a text field, the query value is not tokenized, but it will match against the tokenized terms in the index. For example, querying "Elasticsearch Guide" will not match any results because the index stores tokenized "elasticsearch" and "guide", not the full string.

Recommendation: When performing full-text search on text fields, use match query instead of term query.

6. Terms Query - Multi-Value Exact Match

Similar to SQL's IN query.

Basic Usage

json

{
  "query": {
    "terms": {
      "status": ["published", "draft", "pending"],
      "boost": 2.0
    }
  }
}

Parameter Explanation:

boost: Adjusts the relevance score weight.
index.max_terms_count: Default maximum 65,536 terms, adjustable via settings.

Test Data:

json

// Document 1
{ "status": "published", "title": "Article 1" }
// Document 2
{ "status": "draft", "title": "Article 2" }
// Document 3
{ "status": "archived", "title": "Article 3" }

Query Result:

Document	Matches?	Explanation
Doc 1	✅	status = "published"
Doc 2	✅	status = "draft"
Doc 3	❌	status = "archived" not in list

Terms Lookup - Fetching values from existing documents as search conditions

When you need to search for a large number of terms, you can fetch field values from existing documents as search conditions, avoiding the need to manually list a large number of terms.

Usage Limitations:

Must enable _source for the field.
Does not support cross-cluster search.
Also subject to index.max_terms_count limitation (default 65,536).

Parameter Explanation:

index: Name of the index where the source document resides.
id: ID of the source document.
path: Name of the field to fetch values from, supports dot notation for nested objects.

Example Scenario: Suppose there is an index storing article statuses, and you want to find all other documents that have the same status as a specific document.

Test Data:

json

// Document 1
{ "status": "published", "title": "Article 1" }
// Document 2
{ "status": "draft", "title": "Article 2" }
// Document 3
{ "status": "archived", "title": "Article 3" }

Query: Fetch status field value from document 2 and search for all documents containing these values

json

{
  "query": {
    "terms": {
      "status": {
        "index": "my-index",
        "id": "2",
        "path": "status"
      }
    }
  }
}

Execution Flow:

Elasticsearch fetches the document with ID 2 from the my-index index.
Reads the status field value: ["draft"].

Uses ["draft"] as the search condition, equivalent to executing:

json

{
  "query": {
    "terms": {
      "status": ["draft"]
    }
  }
}

Query Result:

Document	Matches?	Explanation
Doc 1	❌	status = "published" does not match
Doc 2	✅	status = "draft"
Doc 3	❌	status = "archived" does not match

7. Range Query - Range Search

Used for numeric and date range queries.

Basic Usage

json

{
  "query": {
    "range": {
      "age": {
        "gte": 18,
        "lte": 65,
        "boost": 2.0
      }
    }
  }
}

Parameter Explanation:

gt: Greater than.
gte: Greater than or equal.
lt: Less than.
lte: Less than or equal.
format: Date format, overrides the default format in the field mapping.
relation: Only applicable to range type fields (e.g., date_range, integer_range, etc.), specifies range matching method:
- INTERSECTS (default): Intersection match - matches if the query range overlaps with the document range.
- CONTAINS: Contains match - document range completely contains the query range.
- WITHIN: Within match - document range is completely within the query range.
time_zone: Time zone setting, used to convert date values to UTC.
boost: Adjusts the relevance score weight (default 1.0).

Test Data:

json

// Document 1
{ "age": 25, "name": "Alice" }
// Document 2
{ "age": 17, "name": "Bob" }
// Document 3
{ "age": 70, "name": "Charlie" }

Query Result (age range 18-65):

Document	Matches?	Explanation
Doc 1	✅	25 is in range
Doc 2	❌	17 < 18
Doc 3	❌	70 > 65

Date Range Query

Basic Date Example:

json

{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

Date Example using Date Math:

json

{
  "query": {
    "range": {
      "created_date": {
        "gte": "now-1d/d",
        "lte": "now/d"
      }
    }
  }
}

This query returns documents where the created_date field is between yesterday and today.

Date Math Syntax Explanation:

now: Current time (UTC)
+1h: Plus 1 hour
-1d: Minus 1 day
/d: Round to the day (start or end of the day)
/M: Round to the month
/y: Round to the year

Using Date Math Operator ||:

When a fixed date needs to be combined with date math (e.g., rounding), you must use || to connect them:

json

{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01||/d",  // Use || to connect date and rounding operation
        "lte": "2024-12-31||/d"
      }
    }
  }
}

Date Math Rounding Rules:

Operator	Rounding Behavior	Example
`gt`	Round up to the first millisecond (exclusive)	`2014-11-18\|\|/M` → `2014-12-01T00:00:00.000Z`
`gte`	Round down to the first millisecond (inclusive)	`2014-11-18\|\|/M` → `2014-11-01T00:00:00.000Z`
`lt`	Round down to the last millisecond (exclusive)	`2014-11-18\|\|/M` → `2014-10-31T23:59:59.999Z`
`lte`	Round up to the last millisecond (inclusive)	`2014-11-18\|\|/M` → `2014-11-30T23:59:59.999Z`

format Parameter Explanation

Role of the format parameter:

Overrides the date format defined in the field mapping.
Specifies the date format for query parameters (gte, gt, lte, lt).

format Usage Rules:

If the date field does not specify a format.
- Usually supports multiple common date formats.
- Elasticsearch will attempt to parse automatically.
If the index mapping specifies a format.
- Query parameters (gte, lte, etc.) must match the format defined in the index mapping.
- Or override it using the format parameter in the query.
When using the format parameter.
- All query parameters (gte, gt, lte, lt) must match the format specified by the format parameter.
- Inconsistent formats will cause the query to fail or produce unexpected results.

Example:

json

// Index mapping definition
{
  "mappings": {
    "properties": {
      "created_date": {
        "type": "date",
        "format": "yyyy-MM-dd'T'HH:mm:ss'Z'"  // Define format
      }
    }
  }
}

// ✅ Example 1: Query format matches mapping exactly
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01T00:00:00Z",
        "lte": "2024-12-31T23:59:59Z"
      }
    }
  }
}

// ❌ Example 2: Query format does not match mapping (only provides YMD)
// Error: Format mismatch, cannot parse
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31"
      }
    }
  }
}

// ✅ Example 3: Use format parameter to override mapping format
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31",
        "format": "yyyy-MM-dd"  // Override mapping format
      }
    }
  }
}

// ❌ Example 4: Query parameter format does not match format parameter
// Error: Query parameter format does not match format parameter
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01T00:00:00Z",  // Contains time
        "lte": "2024-12-31T23:59:59Z",
        "format": "yyyy-MM-dd"  // format only defines YMD
      }
    }
  }
}

Time Zone Handling

Using the time_zone parameter:

json

{
  "query": {
    "range": {
      "timestamp": {
        "time_zone": "+01:00",
        "gte": "2020-01-01T00:00:00",
        "lte": "now"
      }
    }
  }
}

Time Zone Conversion Explanation:

The time_zone parameter can use ISO 8601 UTC offset (e.g., +01:00, -08:00).
It can also use IANA time zone IDs (e.g., America/Los_Angeles, Asia/Taipei).
In the example, 2020-01-01T00:00:00 uses UTC offset +01:00, which will be converted to 2019-12-31T23:00:00 UTC.
Note: The time_zone parameter does not affect the value of now; now is always the UTC of the current system time.

Missing Date Components

When the date format is incomplete, Elasticsearch uses the following default values to fill in the gaps (the year will not be replaced):

Component	Default Value
`MONTH_OF_YEAR`	01
`DAY_OF_MONTH`	01
`HOUR_OF_DAY`	23
`MINUTE_OF_HOUR`	59
`SECOND_OF_MINUTE`	59
`NANO_OF_SECOND`	999_999_999

Official Documentation Example (Date part):

If format is yyyy-MM, and gt value is 2099-12.
Elasticsearch will convert it to 2099-12-01T23:59:59.999_999_999Z.
Retains the provided year (2099) and month (12).
Uses default day (01), hour (23), minute (59), second (59), nanosecond (999_999_999).

Actual Test Results (Time part):

The behavior of the time part differs from the official documentation explanation. Actual tests found:

✅ Cases that can be queried successfully:

json

{
  "query": {
    "range": {
      "created_date": {
        "gte": "2023-01-15T08",  // Only provided up to the hour
        "lte": "2023-01-15T08"
      }
    }
  }
}

Can query data for 2023-01-15T08:30:00Z.
This means Elasticsearch formats both the document and the query parameters to the same precision before comparing.

❌ Cases that cannot be queried:

json

// Case 1: Using gt and lte
{
  "query": {
    "range": {
      "joined_date": {
        "gt": "2023-01-15T08",   // Greater than (exclusive)
        "lte": "2023-01-15T08"
      }
    }
  }
}

// Case 2: Using gte and lt
{
  "query": {
    "range": {
      "joined_date": {
        "gte": "2023-01-15T08",
        "lt": "2023-01-15T08"   // Less than (exclusive)
      }
    }
  }
}

Both cases cannot query 2023-01-15T08:30:00Z.
Because gt and lt exclude the specified precision range.

Behavior Inference:

Date part: Follows the official documentation for filling in missing components.
Time part: Formats the document and query parameters to the same precision, then compares.
- Example: "2023-01-15T08" treats all data for 2023-01-15T08:xx:xx as the same time unit.
- Using gte and lte can include data for the entire hour.
- Using gt or lt excludes that time unit.

Recommended Approach:

To avoid unexpected query results due to precision issues, it is recommended to:

Explicitly specify the complete time format.

json

{
  "query": {
    "range": {
      "created_date": {
        "gte": "2023-01-15T08:00:00Z",
        "lte": "2023-01-15T08:59:59Z"
      }
    }
  }
}

Use Date Math rounding functionality.

json

{
  "query": {
    "range": {
      "created_date": {
        "gte": "2023-01-15T08:00:00Z||/h",  // Round to start of hour
        "lte": "2023-01-15T08:59:59Z||/h"   // Round to end of hour
      }
    }
  }
}

Use gte + lte when querying an entire time unit.

json

{
  "query": {
    "range": {
      "created_date": {
        "gte": "2023-01-15T08",  // Includes start of 08:00:00
        "lte": "2023-01-15T08"   // Includes end of 08:59:59
      }
    }
  }
}

Numeric vs String Differences

When using range query on a date field, numeric and string parsing methods differ:

json

// ❌ Error: Numeric values are interpreted as millisecond timestamps
{
  "query": {
    "range": {
      "created_date": {
        "gte": 2020  // Interpreted as 1970-01-01T00:00:02.020Z (2020 milliseconds after 1970)
      }
    }
  }
}

// ✅ Correct: Strings are parsed according to format
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2020"  // Interpreted as 2020-01-01T00:00:00.000Z (Year 2020)
      }
    }
  }
}

Pitfalls of mixing numeric and string values:

When gte/gt/lte/lt mix numeric and string values, different results occur:

json

// ❌ Error: Mixing numeric and date format strings
{
  "query": {
    "range": {
      "created_date": {
        "gte": 2022,              // Numeric: interpreted as milliseconds
        "lte": "2025-01-01"       // String: interpreted as date format
      }
    }
  }
}
// Error: String "2025-01-01" cannot be mixed with numeric, format error

// ✅ Correct: Mixing numeric and pure numeric strings
{
  "query": {
    "range": {
      "created_date": {
        "gte": 2025,              // Numeric: interpreted as milliseconds
        "lte": "2025"             // Pure numeric string: interpreted as milliseconds
      }
    }
  }
}
// Success: Both are treated as millisecond timestamps

// ✅ Correct: Uniformly use strings
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2022",            // String: interpreted as year
        "lte": "2025-01-01"       // String: interpreted as date
      }
    }
  }
}

Important Principles:

It is recommended to uniformly use string format to avoid parsing issues caused by mixing numeric and string values.
Pure numeric strings (e.g., "2025") are treated as millisecond timestamps.
Date format strings (e.g., "2025-01-01") are parsed according to format.
Numeric values are always interpreted as millisecond timestamps.

8. Exists Query - Field Existence Query

Queries whether a field exists (is not null).

Positive Query: Query field exists

json

{
  "query": {
    "exists": {
      "field": "email"
    }
  }
}

Test Data:

json

// Document 1
{ "name": "Alice", "email": "[email protected]" }
// Document 2
{ "name": "Bob", "email": null }
// Document 3
{ "name": "Charlie" }
// Document 4
{ "name": "David", "email": "" }
// Document 5
{ "name": "Eve", "email": [] }

Query Result:

Document	Matches?	Explanation
Doc 1	✅	email field exists and has value
Doc 2	❌	email field is null
Doc 3	❌	No email field
Doc 4	✅	Empty string is still considered existing
Doc 5	❌	Empty array is considered non-existent

Negative Query: Query field does not exist

Use must_not combined with exists to query documents where the field does not exist.

json

{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "email"
        }
      }
    }
  }
}

Special Case Explanation

In some cases, even if the field value exists in the original JSON document, the exists query will still determine it as "non-existent":

index: false and doc_values: false:
- index: false: Field is not indexed, cannot be used for search queries.
- doc_values: false: Field does not store doc values, cannot be used for sorting, aggregation, or script access.
- When both are set to false, exists query considers the field non-existent.
Exceeding ignore_above setting: For keyword type fields, if the length of the field value exceeds the ignore_above limit set in the mapping, the value will not be indexed.
json
```
// Mapping setting ignore_above: 10
{ "tags": "this_is_too_long" }  // Length 15, will not be indexed
```
1
2
ignore_malformed and format error: When the field type is numeric, date, etc., but the written data format is incorrect, if ignore_malformed: true is set in the mapping, the value will be ignored and not indexed.
json
```
// Mapping setting price as integer type, and ignore_malformed: true
{ "price": "not_a_number" }  // Format error, will not be indexed, but document write succeeds
```
1
2

These settings are mainly used to improve data processing fault tolerance, but be aware that they affect the query results of the exists query.

9. Prefix Query - Prefix Search

Queries documents starting with a specific string.

json

{
  "query": {
    "prefix": {
      "username": {
        "value": "admin"
      }
    }
  }
}

Parameter Explanation:

value: Prefix string.
boost: Adjusts the relevance score weight.
case_insensitive: Whether to ignore case, default is false.
rewrite: Query rewrite method, used for performance optimization. When a prefix matches a large number of terms, this parameter controls how to handle the matching results. Common values include constant_score (default, all matches given the same score), top_terms_N (only takes the top N terms), etc. For detailed explanation, please refer to the official documentation.

Test Data:

json

// Document 1
{ "username": "admin123" }

// Document 2
{ "username": "administrator" }

// Document 3
{ "username": "user456" }

Query Result:

Document	Matches?	Explanation
Doc 1	✅	Starts with "admin"
Doc 2	✅	Starts with "admin"
Doc 3	❌	Does not start with "admin"

10. Wildcard Query - Wildcard Search

Uses * and ? for fuzzy search (performance is poor, use with caution).

json

{
  "query": {
    "wildcard": {
      "username": {
        "value": "ad*n?",
        "case_insensitive": true
      }
    }
  }
}

Wildcard Explanation:

*: Matches zero or more characters.
?: Matches a single character.

Parameter Explanation:

value: Query string containing wildcards.
wildcard: Alias for value, same functionality. When both exist, the last parameter takes precedence.
boost: Adjusts the relevance score weight.
case_insensitive: Whether to ignore case, default is false.
rewrite: Query rewrite method.

Comparison of `wildcard` and `value` parameters

Test Data:

json

// Document 1
{ "username": "admin" }
// Document 2
{ "username": "administrator" }
// Document 3
{ "username": "admins" }
// Document 4
{ "username": "user456" }

Query Example: Using both wildcard and value

json

{
  "query": {
    "wildcard": {
      "username": {
        "wildcard": "admin",
        "value": "ad*n?"
      }
    }
  }
}

Parameter Explanation:

wildcard: "admin": Will exactly match "admin".
value: "ad*n?": Will match "ad" start + zero or more characters + "n" + single character.

Query Result (Using value: "ad*n?" because it is last):

Document	Matches?	Explanation
Doc 1	❌	"admin" has only 5 characters, does not match "ad*n?" pattern (requires one more character after n)
Doc 2	✅	"administrator" matches "ad*n?" pattern
Doc 3	✅	"admins" matches "ad*n?" pattern
Doc 4	❌	Does not start with "ad"

If wildcard is last:

json

{
  "query": {
    "wildcard": {
      "username": {
        "value": "ad*n?",
        "wildcard": "admin"
      }
    }
  }
}

Query Result (Using wildcard: "admin" because it is last):

Document	Matches?	Explanation
Doc 1	✅	Exactly matches "admin"
Doc 2	❌	Not an exact match for "admin"
Doc 3	❌	Not an exact match for "admin"
Doc 4	❌	Not an exact match for "admin"

Performance Notes:

Avoid using leading wildcards (e.g., *term or ?term), which will lead to full table scans.
Wildcard queries have no caching mechanism, performance is poor.

11. Regexp Query - Regular Expression Search

Uses regular expressions for complex matching (performance is worst, use with caution).

json

{
  "query": {
    "regexp": {
      "phone": {
        "value": "09[0-9]{8}"
      }
    }
  }
}

Parameter Explanation:

value: Regular expression pattern.
flags: Regular expression flags (e.g., COMPLEMENT, INTERVAL), used to enable additional operators.
case_insensitive: Whether to ignore case, default is false.
max_determinized_states: Maximum number of states, default is 10000. This parameter is used to limit the complexity of the regular expression engine, preventing overly complex regular expressions from causing performance issues or memory exhaustion. An exception is thrown when the regular expression is too complex.
rewrite: Query rewrite method.

Test Data:

json

// Document 1
{ "phone": "0912345678" }
// Document 2
{ "phone": "0987654321" }
// Document 3
{ "phone": "02-12345678" }

Query Result (Querying "09[0-9]{8}"):

Document	Matches?	Explanation
Doc 1	✅	Matches 09 start + 8 digits
Doc 2	✅	Matches 09 start + 8 digits
Doc 3	❌	Format does not match

Flags Parameter Explanation and Examples

The flags parameter is used to enable additional operators for the Lucene regular expression engine. The following uses the same test data to demonstrate the effects of different flags.

Note: These symbols (~, #, <>, &, @) are Lucene-specific extensions, not standard general-purpose regular expression syntax.

Test Data:

json

// Document 1
{ "code": "abc123" }
// Document 2
{ "code": "abc456" }
// Document 3
{ "code": "xyz789" }
// Document 4
{ "code": "def123" }
// Document 5
{ "code": "abc" }

1. COMPLEMENT - Negation Pattern

Uses the ~ operator to negate the subsequent pattern.

json

{
  "query": {
    "regexp": {
      "code": {
        "value": "abc~123",
        "flags": "COMPLEMENT"
      }
    }
  }
}

Query Result:

Document	Matches?	Explanation
Doc 1	❌	"abc123" contains the negated "123"
Doc 2	✅	"abc456" matches "abc" followed by something that is not "123"
Doc 3	❌	Does not start with "abc"
Doc 4	❌	Does not start with "abc"
Doc 5	✅	"abc" followed by nothing, which is not "123"

Usage Notes for text fields:

Be particularly careful about the impact of tokenization when using ~ negation on text fields. For example:

json

// Assume name field is text type
// Data: { "name": "Wing Chou" }

// Query
{
  "query": {
    "regexp": {
      "name": {
        "value": "~(wing)",
        "flags": "COMPLEMENT"
      }
    }
  }
}

At first glance, it might seem this query would exclude "Wing Chou", but in reality:

"Wing Chou" becomes ["wing", "chou"] after tokenization.
~(wing) negates "wing", but "chou" still matches.
Therefore, "Wing Chou" will still appear in the query results.

It is recommended to use negation operators on keyword fields to avoid unexpected results caused by tokenization.

2. INTERVAL - Numeric Range

Uses the <> operator to match numeric ranges.

json

{
  "query": {
    "regexp": {
      "code": {
        "value": "abc<100-200>",
        "flags": "INTERVAL"
      }
    }
  }
}

Query Result:

Document	Matches?	Explanation
Doc 1	✅	"abc123" matches abc + number in 100-200 range
Doc 2	❌	456 in "abc456" is out of range
Doc 3	❌	Does not start with "abc"
Doc 4	❌	Does not start with "abc"
Doc 5	❌	No number after "abc"

3. INTERSECTION - AND Operation

Uses the & operator to match strings that match both patterns simultaneously.

json

{
  "query": {
    "regexp": {
      "code": {
        "value": "abc.+&.+123",
        "flags": "INTERSECTION"
      }
    }
  }
}

Query Result:

Document	Matches?	Explanation
Doc 1	✅	"abc123" matches both "starts with abc" and "ends with 123"
Doc 2	❌	"abc456" does not match "ends with 123"
Doc 3	❌	"xyz789" does not match "starts with abc"
Doc 4	❌	"def123" does not match "starts with abc"
Doc 5	❌	"abc" does not match "ends with 123"

4. ANYSTRING - Match Any String

Uses the @ operator to match any entire string.

Official Example (combined with exclusion logic):

json

{
  "query": {
    "regexp": {
      "code": {
        "value": "@&~(abc.+)",
        "flags": "ANYSTRING|INTERSECTION|COMPLEMENT"
      }
    }
  }
}

This example matches all strings that do not start with "abc".

Note: I cannot understand the actual difference between @&~(abc.+) and simply using ~(abc.+). If you need to use this operator, it is recommended to refer to the official documentation or perform actual tests to confirm the behavior.

5. EMPTY - Match No String

Uses the # operator to represent "matches no string", not even an empty string.

Difference from empty string:

json

// Empty string matches empty data
// ✅ Matches data where code field is empty string
{
  "query": {
    "regexp": {
      "code": {
        "value": ""
      }
    }
  }
}

// # matches no data
// ❌ Matches no data (including empty string)
{
  "query": {
    "regexp": {
      "code": {
        "value": "#",
        "flags": "EMPTY"
      }
    }
  }
}

Actual Use Case (.NET Example):

Mainly used when dynamically combining regular expressions in code to avoid accidentally matching empty string data when there are no query conditions.

csharp

// .NET dynamic combination query condition example
List<string> conditions = new();

if (searchByAbc) {
    conditions.Add("abc.*");
}

if (searchByXyz) {
    conditions.Add("xyz.*");
}

// Use # to avoid matching empty string when no conditions exist
string pattern = conditions.Count > 0
    ? string.Join("|", conditions)  // "abc.*|xyz.*"
    : "#";                          // Ensure no data is matched

SearchRequest searchRequest = new() {
    Query = new RegexpQuery {
        Field = "code",
        Value = pattern,
        Flags = conditions.Count > 0 ? "ALL" : "EMPTY"
    }
};

Notes:

# is a special Lucene operator and cannot be used to match the literal "#" character.

json

// ❌ Error: Cannot be used to query data containing "#" character
// Query data { "code": "#" } → Cannot find
{
  "query": {
    "regexp": {
      "code": {
        "value": "#",
        "flags": "EMPTY"
      }
    }
  }
}

// ❌ Error: Cannot be used to query data containing "#" character
// Query data { "code": "#1" } → Cannot find
{
  "query": {
    "regexp": {
      "code": {
        "value": "#1",
        "flags": "EMPTY"
      }
    }
  }
}

To match the literal "#" character, you need to use a backslash escape (see "Special Character Escaping" section below).

6. Combining Multiple Flags

You can use the | delimiter to enable multiple operators simultaneously.

json

{
  "query": {
    "regexp": {
      "code": {
        "value": "abc<100-500>",
        "flags": "COMPLEMENT|INTERVAL"
      }
    }
  }
}

Flag Support Options:

ALL (default): Enables all optional operators.
NONE: Disables all optional operators.
COMPLEMENT: Enables ~ negation operator.
INTERVAL: Enables <> range operator.
INTERSECTION: Enables & AND operator.
ANYSTRING: Enables @ any string operator.
EMPTY: Enables # empty language operator (matches no string).

Special Character Escaping

In the Lucene regular expression engine, the following characters have special meanings. If you want to use them as ordinary characters, you need to escape them with a backslash \:

Reserved Characters:

text

. ? + * | { } [ ] ( ) " \ #

Escaping Example:

json

// ❌ Error: + is a special character
// Query data { "phone": "+886912345678" } → Cannot find
{
  "query": {
    "regexp": {
      "phone": {
        "value": "+886.*"
      }
    }
  }
}

// ✅ Correct: Use backslash to escape
// Query data { "phone": "+886912345678" } → Can find
{
  "query": {
    "regexp": {
      "phone": {
        "value": "\\+886.*"
      }
    }
  }
}

Notes:

Because the backslash itself needs to be escaped in JSON strings, you need to use a double backslash \\ in JSON queries.

json

// Need to write "\\" in JSON to represent a single backslash
{ "value": "\\+886.*" }  // Actual regular expression is "\+886.*"

Anchor Operator Limitations

Lucene's regular expression engine does not support anchor operators, such as ^ (beginning of line) or $ (end of line). To match a term, the regular expression must match the entire string.

Lucene's regular expression engine does not support anchor operators, such as ^ (beginning of line) or $ (end of line). To match a term, the regular expression must match the entire string.

This means:

^ and $ do not have the special meaning of anchors.
Regular expressions match the entire field value by default (equivalent to already having anchor effects).
Based on tests, ^ and $ are treated as ordinary characters, not anchor operators (using them will result in no data found).

Example:

json

// ✅ Correct: Match pattern directly
{ "value": "abc.*" }      // Matches full string starting with abc

// ❌ Not recommended: Cannot find data for abc, inferred that it should try to match ^abc and abc$
{ "value": "^abc" }
{ "value": "abc$" }

Performance Notes:

Regular expression query performance is extremely poor, should be avoided as much as possible.
Consider using other query methods (e.g., prefix, wildcard) instead.
If you must use it, limit the query scope and set a reasonable max_determinized_states.
Avoid overly complex regular expressions to prevent triggering the max_determinized_states limit.

13. Fuzzy Query - Fuzzy Search

Error-tolerant query, allows spelling errors. Can be used for text and keyword fields.

Text field example:

json

{
  "query": {
    "fuzzy": {
      "name": {
        "value": "wing",
        "fuzziness": "AUTO"
      }
    }
  }
}

Effect: Queries terms with edit distance within the allowed range from wing.

Example:

wing ✓ (Exact match).
wang ✓ (1 char difference: a).
weng ✓ (1 char difference: e).
king ✗ (Too much difference).

Note: Because text fields are processed by an analyzer (tokenization, lowercase), so:

Index: Wing Chou → After tokenization becomes [wing, chou] (lowercased, tokenized).
Query: wing → Can match the term wing.

Parameter Explanation:

value: Term to query (required).
fuzziness: Allowed edit distance (AUTO, 0, 1, 2), recommended to use AUTO.
- AUTO: Automatically determines edit distance based on term length.
- 0: No errors allowed (equivalent to term query).
- 1: Allows 1 character difference.
- 2: Allows 2 character difference.
prefix_length: First N characters must match exactly, default is 0.
max_expansions: Maximum number of candidate terms to expand, default is 50.
transpositions: Whether to allow adjacent character swaps (e.g., ab → ba), default is true.

Complete Example:

json

{
  "query": {
    "fuzzy": {
      "title": {
        "value": "quikc",
        "fuzziness": "AUTO",
        "prefix_length": 2,
        "max_expansions": 10,
        "transpositions": true
      }
    }
  }
}

Parameter Effects:

prefix_length = 2 (First 2 characters must match):

quick ✓ (Starts with qu, matches prefix).
quikc ✓ (Starts with qu, matches prefix).
xuick ✗ (Starts with xu, does not match prefix qu).

max_expansions = 10 (Max 10 candidate terms to expand):

Assuming the index has 20+ similar terms (quick, quit, quiz, quiet, quiche...), Elasticsearch will only take the first 10 candidate terms for searching, ignoring the rest.

Purpose: Limiting the expansion count can improve query performance, avoiding resource consumption from too many candidates.

transpositions = true (Allows adjacent character swaps):

qiuck ✓ (ui ↔ iu, swap counts as 1 edit).
qukic ✓ (ki ↔ ik, swap counts as 1 edit).

transpositions = false (Does not allow swaps):

json

{
  "query": {
    "fuzzy": {
      "title": {
        "value": "qiuck",
        "fuzziness": 1,
        "transpositions": false
      }
    }
  }
}

qiuck ✗ (ui ↔ iu not allowed, requires 2 edits: delete i, insert u).
quick ✓ (Requires only 1 edit: replace i → u).

Keyword field example:

json

{
  "query": {
    "fuzzy": {
      "name.keyword": {
        "value": "Wing Chow",
        "fuzziness": "AUTO"
      }
    }
  }
}

Effect: Performs fuzzy matching on the complete keyword value.

Example:

Wing Chou ✓ (1 char difference: w → u).
Wing Chow ✓ (Exact match).
Wing Zhou ✓ (2 char difference).
John Wang ✗ (Too much difference).

Usage Recommendations:

For text fields:

Recommended to use match query combined with fuzziness parameter, rather than using fuzzy query directly.
Reason: match query is processed by the analyzer (tokenization, lowercase, etc.), which better fits actual search requirements.

Example Comparison:

Scenario: Index contains document name = "Wing Chou" (text field)

→ After analyzer processing, the terms in the index are: ["wing", "chou"] (lowercased, tokenized)

Example 1: fuzziness = 0 (Must match exactly)

json

// Not recommended: Use fuzzy directly (text field)
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "Wing",  // Does not pass through analyzer, matches "Wing" directly
        "fuzziness": 0
      }
    }
  }
}

Query term: Wing (uppercase W).
Index term: wing (lowercase w).
fuzziness = 0 means it must match exactly.
Result: ✗ Cannot find (Wing ≠ wing, case differs).

json

// Recommended: Use match (text field)
{
  "query": {
    "match": {
      "name": {
        "query": "Wing",  // Passes through analyzer, becomes "wing"
        "fuzziness": 0
      }
    }
  }
}

Query term: Wing → Passes through analyzer → wing (lowercase).
Index term: wing (lowercase).
fuzziness = 0 means it must match exactly.
Result: ✓ Can find (matches exactly).

Example 2: fuzziness = 1 (Allows 1 character difference)

json

// Not recommended: Use fuzzy directly (text field)
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "wing chuo",  // Does not tokenize, queries "wing chuo" as a complete term
        "fuzziness": 1
      }
    }
  }
}

Query term: wing chuo (complete string).
Index terms: wing, chou (tokenized).
Result: ✗ Cannot find (index does not have "wing chuo" as a complete term).

json

// Recommended: Use match + fuzziness (text field)
{
  "query": {
    "match": {
      "name": {
        "query": "wing chuo",  // Tokenizes into ["wing", "chou"], and performs fuzzy match on each term
        "fuzziness": 1
      }
    }
  }
}

Query term: wing chuo → Passes through analyzer → ["wing", "chou"].
Index terms: wing, chou.
Result: ✓ Can find (wing matches exactly, chuo differs from chou by 1 character).

For keyword fields:

Can use fuzzy query directly.
Because keyword fields are not processed by an analyzer, fuzzy matching against the complete value is reasonable.

Summary of Usage Timing:

Text fields: Prioritize using match + fuzziness.
Keyword fields: Can use fuzzy query.
Need to match terms directly (no analysis needed): Use fuzzy query.

Edit Distance Explanation:

Edit distance (Levenshtein Distance) refers to the minimum number of operations required to convert one string into another. Allowed operations include:

Insert a character: quic → quick (insert k).
Delete a character: quickk → quick (delete k).
Replace a character: quikc → quick (replace k → c).
Swap adjacent characters (requires transpositions = true): qiuck → quick (swap iu).

For detailed fuzziness parameter explanation, please refer to the "Match Query" section.

14. IDs Query - Query by Document ID

Queries directly by document _id.

json

{
  "query": {
    "ids": {
      "values": ["1", "2", "3"]
    }
  }
}

Use Cases:

Querying by known document IDs.
Batch querying specific documents.
Used in combination with other queries.

15. Nested Query - Nested Object Query

Used for querying nested type fields. Can only be used for nested types, not object types. Can preserve the relationships between fields within array elements.

Mapping Definition:

json

{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "comments": {
        "type": "nested",
        "properties": {
          "author": { "type": "keyword" },
          "rating": { "type": "integer" },
          "text": { "type": "text" }
        }
      }
    }
  }
}

Basic Query:

json

{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "match": { "comments.author": "John" }},
            { "range": { "comments.rating": { "gte": 4 }}}
          ]
        }
      }
    }
  }
}

Parameter Explanation:

path: Path to the nested object (required).
query: Query to execute within the nested object (required).
score_mode: How to calculate the score of the nested object, default is avg.
- avg: Average score (default).
- sum: Sum.
- max: Maximum score.
- min: Minimum score.
- none: Do not calculate score (set to 0).
ignore_unmapped: Whether to ignore errors if the field does not exist, default is false.

Test Data:

json

// Document 1
{
  "title": "Product A",
  "comments": [
    { "author": "John", "rating": 5, "text": "Great!" },
    { "author": "Jane", "rating": 3, "text": "OK" }
  ]
}

// Document 2
{
  "title": "Product B",
  "comments": [
    { "author": "John", "rating": 2, "text": "Poor" },
    { "author": "Bob", "rating": 5, "text": "Excellent" }
  ]
}

Query Result (author = "John" AND rating >= 4):

Document	Matches?	Explanation
Doc 1	✅	John's rating is 5 (>= 4)
Doc 2	❌	John's rating is 2 (< 4)

Why is Nested Query needed?

Problem: object type flattens arrays

If comments is an object type (default), Elasticsearch flattens the array, losing the relationships between elements:

json

// Original data
{
  "title": "Product A",
  "comments": [
    { "author": "John", "rating": 5 },
    { "author": "Jane", "rating": 3 }
  ]
}

// After flattening (relationship lost)
{
  "title": "Product A",
  "comments.author": ["John", "Jane"],
  "comments.rating": [5, 3]
}

Example: Incorrect query result (using object type)

Querying products where "John gave 3 points":

json

{
  "query": {
    "bool": {
      "must": [
        { "term": { "comments.author": "John" }},
        { "term": { "comments.rating": 3 }}
      ]
    }
  }
}

Result: ✓ Will find Document 1 (❌ but this is wrong! John gave 5 points, not 3)

Reason: Elasticsearch only knows author has "John" and rating has 3, but doesn't know "John corresponds to 5 points".

Solution: Use nested type + nested query

Define comments as a nested type, and Elasticsearch will internally store each array element as an independent sub-document (but it remains one document to the user):

json

// What you see: one document
{
  "title": "Product A",
  "comments": [
    { "author": "John", "rating": 5 },
    { "author": "Jane", "rating": 3 }
  ]
}

// Elasticsearch internal storage structure (hidden, user cannot see):
// ├─ Main document: { "title": "Product A" }
// ├─ Sub-document 1: { "author": "John", "rating": 5 }
// └─ Sub-document 2: { "author": "Jane", "rating": 3 }

Key points:

To you, it is still one document.
Elasticsearch internally handles sub-document relationships automatically.
When querying, use nested query to ensure conditions are matched "within the same sub-document".

Querying products where "John gave 3 points":

json

{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term": { "comments.author": "John" }},
            { "term": { "comments.rating": 3 }}
          ]
        }
      }
    }
  }
}

Result: ✗ Cannot find (✓ Correct! John gave 5 points, not 3)

score_mode parameter example:

When multiple nested objects in a document match the query, score_mode determines how to calculate the document's final score.

Test Data:

json

// Document 1
{
  "title": "Product A",
  "comments": [
    { "author": "Alice", "rating": 5, "text": "Excellent" },
    { "author": "Bob", "rating": 4, "text": "Good" },
    { "author": "Charlie", "rating": 3, "text": "Average" }
  ]
}

// Document 2
{
  "title": "Product B",
  "comments": [
    { "author": "David", "rating": 5, "text": "Perfect" }
  ]
}

Query:

json

{
  "query": {
    "nested": {
      "path": "comments",
      "score_mode": "max",
      "query": {
        "range": { "comments.rating": { "gte": 3 }}
      }
    }
  }
}

Result Comparison (assuming each matching comment has a score of 1.0):

Document	Number of matching comments	max	avg	sum	min
Doc 1	3	1.0	1.0	3.0	1.0
Doc 2	1	1.0	1.0	1.0	1.0

Explanation:

When using sum, Document 1's score will be higher (because there are 3 matching comments).
When using max or avg, the scores for both documents are the same.
This affects sorting results.

Purpose:

Using sum allows documents with "more matching comments" to be sorted higher.
Using max only considers the "most relevant comment".

Advanced: Using inner_hits to fetch matching nested objects

Sometimes you don't just want to know "which document matches", but also "which nested object within the document matches".

json

{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term": { "comments.author": "John" }},
            { "range": { "comments.rating": { "gte": 4 }}}
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

Explanation:

inner_hits is an object type parameter.
Using an empty object {} means using default settings.
inner_hits supports various parameters (e.g., size, from, _source, etc.), but they are outside the scope of this note.

Return Result:

json

{
  "hits": {
    "hits": [
      {
        "_source": {
          "title": "Product A",
          "comments": [
            { "author": "John", "rating": 5, "text": "Great!" },
            { "author": "Jane", "rating": 3, "text": "OK" }
          ]
        },
        "inner_hits": {
          "comments": {
            "hits": {
              "hits": [
                {
                  "_source": {
                    "author": "John",
                    "rating": 5,
                    "text": "Great!"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Purpose: You can clearly see specifically which comment matches the condition, rather than the entire comments array.

object vs nested quick comparison:

Feature	object (default)	nested
Array handling	Flattened (relationship lost)	Maintained independently (relationship maintained)
Query method	General query (match, term, bool...)	Must use nested query
Use case	Single object or array not requiring relationships	Requires maintaining array element relationships
Performance	Better	Worse (extra overhead)

Usage Recommendations:

Use nested when:

The field is an array.
You need to query multiple conditions "within the same array element".
You need to maintain relationships between array elements.

Example Scenarios:

Order product list (product name + price must correspond).
Employee project experience (project name + role must correspond).
Product reviews (reviewer + rating must correspond).

Use object when:

The field is not an array.
Relationships between array elements do not need to be maintained.
You are pursuing better query performance.

Change Log

2025-11-04 Initial document creation.

On this page

Elasticsearch Query DSL Syntax Notes ​

Query DSL vs Query String ​

1. More Complete Functionality ​

2. Clearer Structure ​

Common Query DSL Syntax ​

1. Match Query - Full-Text Search ​

Basic Query ​

operator Parameter ​

minimum_should_match Parameter ​

fuzziness Parameter ​

Other Parameters ​

2. Multi Match Query - Multi-field Search ​

Parameter Support by Type ​

lenient Parameter ​

Query Type Explanation ​

3. Combined Fields Query - Cross-Field Term Search ​

Main Parameters ​

Execution Logic ​

Usage Limitations ​

4. Match Phrase Query - Phrase Search ​

5. Term Query - Exact Match ​

6. Terms Query - Multi-Value Exact Match ​

Basic Usage ​

Terms Lookup - Fetching values from existing documents as search conditions ​

7. Range Query - Range Search ​

Basic Usage ​

Date Range Query ​

format Parameter Explanation ​

Time Zone Handling ​

Missing Date Components ​

Numeric vs String Differences ​

8. Exists Query - Field Existence Query ​

Positive Query: Query field exists ​

Negative Query: Query field does not exist ​

Special Case Explanation ​

9. Prefix Query - Prefix Search ​

10. Wildcard Query - Wildcard Search ​

Comparison of wildcard and value parameters ​

11. Regexp Query - Regular Expression Search ​

Flags Parameter Explanation and Examples ​

Special Character Escaping ​

Anchor Operator Limitations ​

13. Fuzzy Query - Fuzzy Search ​

14. IDs Query - Query by Document ID ​

15. Nested Query - Nested Object Query ​

Change Log ​

Tags

Related Notes

Considerations for Elasticsearch Dynamic Field Mapping

Elasticsearch QueryString Query Syntax Notes

Installing Elasticsearch and Kibana with Docker Compose

Single-Node Elasticsearch Installation Guide for Windows

CloudyWing's Note

Quick Links

Contact

Elasticsearch Query DSL Syntax Notes

Query DSL vs Query String

1. More Complete Functionality

2. Clearer Structure

Common Query DSL Syntax

1. Match Query - Full-Text Search

Basic Query

operator Parameter

minimum_should_match Parameter

fuzziness Parameter

Other Parameters

2. Multi Match Query - Multi-field Search

Parameter Support by Type

lenient Parameter

Query Type Explanation

3. Combined Fields Query - Cross-Field Term Search

Main Parameters

Execution Logic

Usage Limitations

4. Match Phrase Query - Phrase Search

5. Term Query - Exact Match

6. Terms Query - Multi-Value Exact Match

Basic Usage

Terms Lookup - Fetching values from existing documents as search conditions

7. Range Query - Range Search

Basic Usage

Date Range Query

format Parameter Explanation

Time Zone Handling

Missing Date Components

Numeric vs String Differences

8. Exists Query - Field Existence Query

Positive Query: Query field exists

Negative Query: Query field does not exist

Special Case Explanation

9. Prefix Query - Prefix Search

10. Wildcard Query - Wildcard Search

Comparison of `wildcard` and `value` parameters

11. Regexp Query - Regular Expression Search

Flags Parameter Explanation and Examples

Special Character Escaping

Anchor Operator Limitations

13. Fuzzy Query - Fuzzy Search

14. IDs Query - Query by Document ID

15. Nested Query - Nested Object Query

Change Log